North and South: Danish voters and Syrian refugees

Task 1: Get spatial data for municipalities in Denmark

You can download administrative data for Denmark from the GADM dataset, the Global administrative boundaries, hosted by UCDavis. You do this by using the getData() function in the raster package. For GADM data, you need to specify what level of admin boundaries you wish to download (0=country, 1=first level subdivision aka regions, 2=second level aka municipalities, etc.). Read this blog on the power of raster package when it comes to available datasets.

Instructions:

  • Use getData() function and level = 2 to download the boundaries of Danish municipalities.
  • Note the class of the mun_sp object and the variable called NAME_2, it should be the name of the municipality
  • Convert the Spatial dataframe to an sf object with st_as_sf() and project to Danish CRS (UTM32)
  • Use mapview or tmap library to map all the municipalities.
  • Sort the NAME_2 field to see how the Danish municipalities are spelled. You may need to change them later for the spatial data to join the attributes.
 [1] "Albertslund"       "Allerød"           "Assens"           
 [4] "Ballerup"          "Billund"           "Bornholm"         
 [7] "Brøndby"           "Brønderslev"       "Christiansø"      
[10] "Dragør"            "Egedal"            "Esbjerg"          
[13] "Fanø"              "Favrskov"          "Faxe"             
[16] "Fredensborg"       "Fredericia"        "Frederiksberg"    
[19] "Frederikshavn"     "Frederikssund"     "Furesø"           
[22] "Faaborg-Midtfyn"   "Gentofte"          "Gladsaxe"         
[25] "Glostrup"          "Greve"             "Gribskov"         
[28] "Guldborgsund"      "Haderslev"         "Halsnæs"          
[31] "Hedensted"         "Helsingør"         "Herlev"           
[34] "Herning"           "Hillerød"          "Hjørring"         
[37] "Holbæk"            "Holstebro"         "Horsens"          
[40] "Hvidovre"          "Høje Taastrup"     "Hørsholm"         
[43] "Ikast-Brande"      "Ishøj"             "Jammerbugt"       
[46] "Kalundborg"        "Kerteminde"        "Kolding"          
[49] "København"         "Køge"              "Langeland"        
[52] "Lejre"             "Lemvig"            "Lolland"          
[55] "Lyngby-Taarbæk"    "Læsø"              "Mariagerfjord"    
[58] "Middelfart"        "Morsø"             "Norddjurs"        
[61] "Nordfyns"          "Nyborg"            "Næstved"          
[64] "Odder"             "Odense"            "Odsherred"        
[67] "Randers"           "Rebild"            "Ringkøbing-Skjern"
[70] "Ringsted"          "Roskilde"          "Rudersdal"        
[73] "Rødovre"           "Samsø"             "Silkeborg"        
 [ reached getOption("max.print") -- omitted 24 entries ]
[1] 31
[1] 60

Task 2: Wrangle the attributes and join to the spatial data

In order to show something we need to connect the spatial polygons with some attributes. Let’s pick the civil status table from Denmark Statistik and calculate the total numbers of men and women in each municipality and calculate the percentages for single men and women so that the singles know where to go to find a significant other :).

Here we get to practice basic tidyverse functions and the use of tmap package.

  • Use read_sheet() function from googlesheets4 package to load the prepared election data for 2011, 2015, and 2019
  • remember that gs4_deauth() can help if you have difficulty getting data from GDrive
  • Check the municipality names, paying attention to Aarhus, Vesthimmerlands, etc. Is the spelling of municipality names and election region names the same in all instances?
 [1] "Albertslund"       "Allerød"           "Assens"           
 [4] "Ballerup"          "Billund"           "Bornholm"         
 [7] "Brøndby"           "Brønderslev"       "Christiansø"      
[10] "Dragør"            "Egedal"            "Esbjerg"          
[13] "Fanø"              "Favrskov"          "Faxe"             
[16] "Fredensborg"       "Fredericia"        "Frederiksberg"    
[19] "Frederikshavn"     "Frederikssund"     "Furesø"           
[22] "Faaborg-Midtfyn"   "Gentofte"          "Gladsaxe"         
[25] "Glostrup"          "Greve"             "Gribskov"         
[28] "Guldborgsund"      "Haderslev"         "Halsnæs"          
[31] "Hedensted"         "Helsingør"         "Herlev"           
[34] "Herning"           "Hillerød"          "Hjørring"         
[37] "Holbæk"            "Holstebro"         "Horsens"          
[40] "Hvidovre"          "Høje-Taastrup"     "Hørsholm"         
[43] "Ikast-Brande"      "Ishøj"             "Jammerbugt"       
[46] "Kalundborg"        "Kerteminde"        "Kolding"          
[49] "København"         "Køge"              "Langeland"        
[52] "Lejre"             "Lemvig"            "Lolland"          
[55] "Lyngby-Taarbæk"    "Læsø"              "Mariagerfjord"    
[58] "Middelfart"        "Morsø"             "Norddjurs"        
[61] "Nordfyns"          "Nyborg"            "Næstved"          
[64] "Odder"             "Odense"            "Odsherred"        
[67] "Randers"           "Rebild"            "Ringkøbing-Skjern"
[70] "Ringsted"          "Roskilde"          "Rudersdal"        
[73] "Rødovre"           "Samsø"             "Silkeborg"        
 [ reached getOption("max.print") -- omitted 24 entries ]
[1] 99
character(0)
                           Party sum2011 sum2015 sum2019
            A. Socialdemokratiet  879615  924940  914882
             B. Radikale Venstre  336698  161009  304714
  C. Det Konservative Folkeparti  175047  118003  233865
 F. SF - Socialistisk Folkeparti  326192  147578  272304
             O. Dansk Folkeparti  436726  741746  308513
                           Total 2154278 2093276 2034278
Simple feature collection with 495 features and 11 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: WGS 84 / UTM zone 32N
# A tibble: 495 x 12
# Groups:   NAME_2, Party [495]
   NAME_2      Party                   Y2011 Y2015 Y2019 sum2011 sum2015 sum2019
 * <chr>       <chr>                   <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
 1 Albertslund A. Socialdemokratiet     4823  4836  4464   10935   10027    9780
 2 Albertslund C. Det Konservative Fo~   547   339   620   10935   10027    9780
 3 Albertslund F. SF - Socialistisk F~  2148  1098  2092   10935   10027    9780
 4 Albertslund O. Dansk Folkeparti      1984  3116  1225   10935   10027    9780
 5 Albertslund B. Radikale Venstre      1433   638  1379   10935   10027    9780
 6 Allerød     B. Radikale Venstre      2181  1287  2359    8441    8960    9896
 7 Allerød     A. Socialdemokratiet     2414  3450  3079    8441    8960    9896
 8 Allerød     F. SF - Socialistisk F~  1156   664  1171    8441    8960    9896
 9 Allerød     C. Det Konservative Fo~  1362  1171  2380    8441    8960    9896
10 Allerød     O. Dansk Folkeparti      1328  2388   907    8441    8960    9896
# ... with 485 more rows, and 4 more variables: geometry <POLYGON [m]>,
#   pct_vote2011 <dbl>, pct_vote2015 <dbl>, pct_vote2019 <dbl>

Task 4: Cartogram

As you can see from the maps, the area of municipalities varies considerably. When mapping them, the large areas carry more visual “weight” than small areas, although just as many people or more people live in the small areas. Voters in low-density rural regions can thus visually outweigh the urban hi-density populations.

One technique for correcting for this is the cartogram. This is a controlled distortion of the regions, expanding some and contracting others, so that the area of each region is proportional to a desired quantity, such as the population. The cartogram also tries to maintain the correct geography as much as possible, by keeping regions in roughly the same place relative to each other.

The cartogram package contains functions for creating cartograms. You give it a spatial data frame and the name of a column, and you get back a similar data frame but with regions distorted so that the region area is proportional to the column value of the regions.

You’ll also use the sf package for computing the areas of newly generated regions with the st_area() function.

Instructions

The elections sf object should be already loaded in your environment.

  • Load the cartogram package.
  • Filter out the Danske Folkeparti votes from your elections dataset, creating a DF object
  • Plot total electorate over municipality area for year 2015 in the DF data. Deviation from a straight line shows the degree of misrepresentation.
  • Create a cartogram scaling to the pct_vote2015 column.
  • Check that the DF voter population is proportional to the area.
  • Plot the pct_vote2015 percentage on the cartogram. Notice how some areas have relatively shrunk or grown.

Task 5: Spatial autocorrelation test

If we look at the facetted tmaps the election results in 2015 seem to have spatial correlation - specifically the percentage of voters favoring Danske Folkeparti increases as you move towards the German border. This trend is not as visible in the cartogram, where the growth is more apparent in Sjæland, and other islands, like Samsø. How much correlation is there, really? By correlation, we mean : pick any two kommunes that are neighbors - with a shared border - and the chances are they’ll be more similar than any two random boroughs. This can be a problem when using statistical models that assume, conditional on the model, that the data points are independent.

The spdep package has functions for measures of spatial correlation, also known as spatial dependency. Computing these measures first requires you to work out which regions are neighbors via the poly2nb() function, short for “polygons to neighbors”. The result is an object of class nb. Then you can compute the test statistic and run a significance test on the null hypothesis of no spatial correlation. The significance test can either be done by Monte-Carlo or theoretical models.

In this example you’ll use the Moran “I” statistic to test the spatial correlation of the Danske Folkeparti voters in 2015.

Instructions I - defining neighbors

  • Load the elections spatial dataset with attributes
  • Consider simplifying the boundaries if the data is too heavy for your computer and takes long to visualise
  • Load the spdep library and create nb object of neighbors using queen adjacency
  • Pass elections to poly2nb() to find the neighbors of each borough polygon. Assign to nb.
  • Get the center points of each borough by passing elections to st_centroid and then to st_coordinates(). Assign to mun_centers.
  • Update the basic map of the DK municipalities by adding the connections.
    • In the second plot call pass nb and mun_centers.
    • Also pass add = TRUE to add to the existing plot rather than starting a new one.
Simple feature collection with 495 features and 11 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: WGS 84 / UTM zone 32N
# A tibble: 495 x 12
# Groups:   NAME_2, Party [495]
   NAME_2      Party                   Y2011 Y2015 Y2019 sum2011 sum2015 sum2019
 * <chr>       <chr>                   <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
 1 Albertslund A. Socialdemokratiet     4823  4836  4464   10935   10027    9780
 2 Albertslund C. Det Konservative Fo~   547   339   620   10935   10027    9780
 3 Albertslund F. SF - Socialistisk F~  2148  1098  2092   10935   10027    9780
 4 Albertslund O. Dansk Folkeparti      1984  3116  1225   10935   10027    9780
 5 Albertslund B. Radikale Venstre      1433   638  1379   10935   10027    9780
 6 Allerød     B. Radikale Venstre      2181  1287  2359    8441    8960    9896
 7 Allerød     A. Socialdemokratiet     2414  3450  3079    8441    8960    9896
 8 Allerød     F. SF - Socialistisk F~  1156   664  1171    8441    8960    9896
 9 Allerød     C. Det Konservative Fo~  1362  1171  2380    8441    8960    9896
10 Allerød     O. Dansk Folkeparti      1328  2388   907    8441    8960    9896
# ... with 485 more rows, and 4 more variables: geometry <POLYGON [m]>,
#   pct_vote2011 <dbl>, pct_vote2015 <dbl>, pct_vote2019 <dbl>

Neighbour list object:
Number of regions: 99 
Number of nonzero links: 360 
Percentage nonzero weights: 3.673095 
Average number of links: 3.636364 
8 regions with no links:
4 6 43 55 57 79 84 89

Instructions II - Moran’s I

Now that your neighbors are determined and centroids are computed, let’s continuing with the Moran’s I statistic

  • Create a subset with municipalities for O.Danske Folkeparti
  • Feed the pct_2011 vector into moran.test().
    • moran.test() needs a weighted version of the nb object which you get by calling nb2listw().
    • After you specify your neighbor nbobject (mun_nb) you should define the weights style = "W". Here, style = "W" indicates that the weights for each spatial unit are standardized to sum to 1 (this is known as row standardization). For example, municipality 1 has 3 neighbors, and each of those neighbors will have weights of 1/3. This allows for comparability between areas with different numbers of neighbors.
    • You will need another argument in both spatial weights and at the level of the test. zero.policy= TRUE deals with situations when an area has no neighbors based on your definition of neighbor (many islands in Denmark). When this happens and you don’t include zero.policy= TRUE, you’ll get the following error
    • Run the test against the theoretical distribution of Moran’s I statistic. Find the p-value. Can you reject the null hypothesis of no spatial correlation?
  • Inspect a map of pct_2011.
  • Run another Moran I statistic test, this time on the percent of single women.
    • Use 999 Monte-Carlo iterations via moran.mc().
    • The first two arguments are the same as for moran.test().
    • You also need to pass the argument nsim = 999.
    • Note the p-value. Can you reject the null hypothesis this time?

    Moran I test under randomisation

data:  DF$pct_vote2015  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 0.9319, p-value = 0.1757
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.062967426      -0.011111111       0.006319028 

    Moran I test under randomisation

data:  DF$pct_vote2011  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 0.29615, p-value = 0.3836
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.012506168      -0.011111111       0.006359641 

    Monte-Carlo simulation of Moran I

data:  DF$pct_vote2015 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.062967, observed rank = 854, p-value = 0.146
alternative hypothesis: greater

Marvellous Moran Testing! You should have found that the p-value was around 0.079 in 2015 and 0.15 in 2011 the first case, thus you did not find any significant spatial correlation. In Monte Carlo simulation, the p-value was around 0.053, so you did find some not very significant spatial correlation (strongly positive).

Repeat the same test for Social Democrats


    Moran I test under randomisation

data:  DKSD$pct_vote2015  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 1.7429, p-value = 0.04067
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.128014426      -0.011111111       0.006371817 

    Moran I test under randomisation

data:  DKSD$pct_vote2011  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 0.42684, p-value = 0.3347
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.022736836      -0.011111111       0.006288392 

    Monte-Carlo simulation of Moran I

data:  DKSD$pct_vote2011 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.022737, observed rank = 705, p-value = 0.295
alternative hypothesis: greater

    Monte-Carlo simulation of Moran I

data:  DKSD$pct_vote2015 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.12801, observed rank = 963, p-value = 0.037
alternative hypothesis: greater

Phenomenal political testing. Social Democrats show even less correlation. P-value in Moran I test is was around 0.13 in 2011 results and 0.24 in 2015 results, thus no significant spatial correlation. In Monte Carlo simulation, the p-value was around 0.24, suggesting there is insignificant (positive) spatial correlation.

Well-done! Not so much correlation as it might seem at the first sight.

Task 6: Different sorts of neighborhood: 50 km

Connect the nearest places (islands)

Simple feature collection with 99 features and 13 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: WGS 84 / UTM zone 32N
First 10 features:
      GID_0  NAME_0   GID_1      NAME_1 NL_NAME_1     GID_2      NAME_2
36828   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.1_1 Albertslund
36664   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.2_1     Allerød
36778   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.3_1    Ballerup
37010   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.4_1    Bornholm
36900   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.5_1     Brøndby
      VARNAME_2 NL_NAME_2  TYPE_2    ENGTYPE_2 CC_2   HASC_2
36828      <NA>      <NA> Kommune Municipality <NA> DK.HS.AB
36664      <NA>      <NA> Kommune Municipality <NA> DK.HS.AL
36778      <NA>      <NA> Kommune Municipality <NA> DK.HS.BA
37010      <NA>      <NA> Kommune Municipality <NA> DK.HS.BO
36900      <NA>      <NA> Kommune Municipality <NA> DK.HS.BR
                            geometry
36828 POLYGON ((712057 6173414, 7...
36664 POLYGON ((700891 6191571, 7...
36778 POLYGON ((715156 6178972, 7...
37010 POLYGON ((878103.7 6112929,...
36900 MULTIPOLYGON (((717343 6174...
 [ reached 'max' / getOption("max.print") -- omitted 5 rows ]

Task 7: Different sorts of neighbourhood: k neighbors

Simple feature collection with 99 features and 13 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: WGS 84 / UTM zone 32N
First 10 features:
      GID_0  NAME_0   GID_1      NAME_1 NL_NAME_1     GID_2      NAME_2
36828   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.1_1 Albertslund
36664   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.2_1     Allerød
36778   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.3_1    Ballerup
37010   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.4_1    Bornholm
36900   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.5_1     Brøndby
      VARNAME_2 NL_NAME_2  TYPE_2    ENGTYPE_2 CC_2   HASC_2
36828      <NA>      <NA> Kommune Municipality <NA> DK.HS.AB
36664      <NA>      <NA> Kommune Municipality <NA> DK.HS.AL
36778      <NA>      <NA> Kommune Municipality <NA> DK.HS.BA
37010      <NA>      <NA> Kommune Municipality <NA> DK.HS.BO
36900      <NA>      <NA> Kommune Municipality <NA> DK.HS.BR
                            geometry
36828 POLYGON ((712057 6173414, 7...
36664 POLYGON ((700891 6191571, 7...
36778 POLYGON ((715156 6178972, 7...
37010 POLYGON ((878103.7 6112929,...
36900 MULTIPOLYGON (((717343 6174...
 [ reached 'max' / getOption("max.print") -- omitted 5 rows ]

Taks 8: Rerun Moran’s I

Now let’s rerun Moran’s I with different neighbour conceptions


    Moran I test under randomisation

data:  DF$pct_vote2015  
weights: nb2listw(nb_50, style = "W", zero.policy = TRUE)    

Moran I statistic standard deviate = 0.32916, p-value = 0.371
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.006073273      -0.010204082       0.002445438 

    Moran I test under randomisation

data:  DF$pct_vote2015  
weights: nb2listw(knn2nb(k3), style = "W", zero.policy = TRUE)    

Moran I statistic standard deviate = 1.4583, p-value = 0.07238
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.098900819      -0.010204082       0.005597403 

    Moran I test under randomisation

data:  DF$pct_vote2011  
weights: nb2listw(knn2nb(k3), style = "W", zero.policy = TRUE)    

Moran I statistic standard deviate = 0.87973, p-value = 0.1895
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
       0.05580711       -0.01020408        0.00563034 

    Monte-Carlo simulation of Moran I

data:  DF$pct_vote2015 
weights: nb2listw(knn2nb(k3), zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.098901, observed rank = 921, p-value = 0.079
alternative hypothesis: greater